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Abstract 

Given a strictly increasing, continuous function ■& : R+ — > R+, based on the cost functional 
J x xX i? (d(x, y)) dq(x,y), we define the L -Wasserstein distance W&((jl, v) between probability mea- 
sures (j,, v on some metric space (X,d). The function ■& will be assumed to admit a representation 
# = ipoip as a composition of a convex and a concave function <p and tf), resp. Besides convex functions 
and concave functions this includes all C 2 functions. 



• ana concave mncuons mis mcmues an l. mncuons. 

| For such functions i? we extend the concept of Orlicz spaces, defining the metric space L (X, m 

of measurable functions / : X —> R such that, for instance, 

f 

< 

1 Convex- Concave Compositions 



Throughout this paper, i9 will be a strictly increasing, continuous function from R + to R+ with #(0) = 0. 

Definition 1.1. $ u«ZZ &e called ccc function ("convex-concave composition") iff there exist two strictly 
increasing continuous functions (p, ip : R+ — > K + with <p(0) = ip(0) = s.t. tp is convex, ip is concave and 



^•1 *& = <P O Ip, 

£N| ' The pair (tp,ip) will be called convex- concave factorization offi. 

The factorization is called minimal (or non-redundant) if for any other factorization ((p,ip) the func- 
tion ip^ 1 o ip is convex. 

Two minimal factorizations of a given function $ differ only by a linear change of variables. Indeed, 
if ip^ 1 o tp is convex and also ip^ 1 o ip is convex then there exists a A 6 (0, oo) s.t. ip(t) = (p(Xt) and 

For each convex, concave or ccc function / : M + — > R + put f'(t) := f'(t+) := lim^o r [/(f + h) — f(t)]. 
K> ; 

Lemma 1.2. (jj For any ccc function i), the function log??' is locally of bounded variation and the 
distribution (log??')' defines a signed Radon measure on (0,oo), henceforth denoted by d(\og "!?'). 

(ii) A pair (tp,ip) of strictly increasing convex or concave, resp., continuous functions with ip(0) = 
ip(0) = is a factorization of $ iff 

d(\og 0') = CMlog tp') + d(\og rl/) (1) 

in the sense of signed Radon measures. 

(in) The factorization (ip,tp) is minimal iff for any other factorization {(p,ip) 

-d(logV') < -dQogfi) 

in the sense of nonnegative Radon measures on (0, oo). 

(iv) Every ccc function d admits a minimal factorization given by d :— ■& o tf^ 1 and 

d(x) :— J exp ^— J dz/_(z)^ dy 

where dv^(z) denotes the negative part of the Radon measure dv(z) — d(log'&')(z). 
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Proof, (i), (ii): The chain rule for convex/concave functions yields 

for each factorization (p,ip) of a ccc function §. Taking logarithms it implies that log$' locally is a BV 
function (as a difference of two increasing functions) and, hence, that the associated Radon measures 
satisfy 

d(log i?') = d(log tp'oip) + d(log ip') 
= ^-^(log^J + dOog^)- 

(iii) : The factorization (ip, ip) is minimal if and only if for any other factorization (jp, ip) the function 
u = tp^ 1 o ip = ip o ip' 1 is convex. Since \ogip' — logu'(ip) + logip', the latter is equivalent to 

d(log^)>#og^') 

which is the claim. 

(iv) : Define $ as above. It remains to verify that < oo. Let (ip, ip) be any convex-concave 
factorization of Without restriction assume ^'(1) = 1. Then the Hahn decomposition of (H} yields 



dv- < -d{\ogip'). (2) 



Hence, for all < x < 1 



0<$(x) = J exp(^J dz/_(z)j dy 

< J cxp ^— J d(logip')(z)j dy — tp(x) < oo. 

This already implies that $ is finite, strictly increasing and continuous on [0, oo). (For instance, for x > 1 
it follows i}(x) < + x — 1.) Moreover, one easily verifies that d is concave. 

Since v+,f- are the minimal nonnegative measures in the ('Hahn' or 'Jordan') decomposition of 
v = i>+ — i>-, it follows that •&) is a minimal cc decomposition of □ 

Examples 1.3. • Each convex function $ is a ccc function. A minimal factorization is given by 
(<&,Id). 



Each concave function $ is a ccc function. A minimal factorization is given by (Id, 



• Each C 2 function ■& with i?'(0+) > is a ccc function. The minimal factorization is given by 



\Jl 



d'{z) 



and $ := "dod 1 . (The condition i?'(0+) > can be replaced by the strictly weaker requirement that 
the previous integral defining d is finite.) 

2 The Metric Space 

Let (X, S, fi) be a cr-finite measure space and (ip, ip) a minimal ccc factorization of a given function 
Then L^(X,p) will denote the space of all measurable functions / : X — > R such that 

for some i € (0, oo) where as usual functions which agree almost everywhere are identified. Note that - 
due to the fact that r t-> <p(r) for large r grows at least linearly - the previous condition is equivalent to 
the condition J x tp (jip(\f\)) dp < 1 for some t G (0, oo). 

Theorem 2.1. L § (X,p) is a complete metric space with the metric 

d#(f,g) =inf jt G (0,oo) : J y fl^Qf - g\)\ dp < 1 
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The definition of this metric does not depend on the choice of the minimal ccc factorization of the 
function ■&. However, choosing an arbitrary convex-concave factorization of i9 might change the value of 
dig. 

Note that always d$(f,g) — dg(f — g, 0). 

Proof. Let f,g,h E L^(X,fi) be given and choose r, s > with d#(f,g) < r and d#(g,h) < s. The latter 
implies 

J x V (^(1/ - <?l)) <*/*<!, V Qv(l<? - &|)) d M < 1. 

Concavity of ip yields tp(\f — h\) < ip(\f — g\) + ip{\g — ft|). Put t = r + s. Then convexity of ip implies 

V -h\))<J r -. + s - ■ ws-w) < r -. v (m_m + i^(^\ g -h\) 

Hence, 

x ,(>-»i,)*<:7 x ,(^)* + ^/ x ,(*<^)*.<r. 1 + £. 1 _ 1 

and thus ft.) < t. This proves that <i#(/, ft) < d$(f,g) + d^{g, ft). 

In order to prove the completeness of the metric, let (/„)„ be a Cauchy sequence in ifi ' . Then 
d${f n ,fm) < £n for all n,m with m > n and suitable e„ \ 0. Choose an increasing sequence of 
measurable sets Xk, k G N, with fi(Xk) < oo and UfcX^ = X. Then 



¥> -Wn-/m|) ) d^<l 

for all k, m, n with m > n. Jensen's inequality implies 



/ Kf 



and thus 



/ M/n) - Wm)| d^X < C n ■ M (X fc ) • p" 1 f-^ 



In other words, (ip{f n j) n is a Cauchy sequence in L x {X k , fi). It follows that it has a subsequence (ip{fni))i 
which converges /i-almost everywhere on X k . In particular, {f ni )i converges almost everywhere on X^ 
towards some limiting function / (which easily is shown to be independent of k). 
Finally, Fatou's lemma now implies 



for each k and n Hence, 



/ f (-4>(\fn - f\)) d t i < liminf f <p f-V(|/n - /ml)) d[i 

JX k \ e n J J Xk \ e n J 



< 1 



that is, 

which proves the claim. 

Finally, it remains to verify that 

<U(f, g) = f = g M-a.e. on X. 

The implication <= is trivial. For the reverse implication, we may argue as in the previous completeness 
proof: d#(f,g) = will yield f x ip (jip(\f — g\)) d/j, < 1 for all k <G N and alH > which in turn implies 
Ix k — V'Cfl') I = 0- The latter proves / = g fi-a.e. on X which is the claim. □ 

Examples 2.2. If #(r) = r p /or some p G (0, oo) iften 

i/p* 



Mf,9)= (J x \f-g\ p dv 



with p* := p if p > 1 and p* := 1 if p < 1. 
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Proposition 2.3. (i) If -d is convex then \\f\\i,*(x,n) := d$(f,0) is indeed a norm and L®(X,n) is a 
Banach space, called Orlicz space. The norm is called Luxemburg norm. 

(ii) If d is concave then 

d*tf,g)= [ m-9\)dl*>\W)-#(g)\\r.Hx,ri. 
Jx 

(Hi) For general ccc function i9 = ip o ip 

<k(f,g) = ||^(|/ ~ 9\)\\li>{x,^- 
(iv) If /j(M) = 1 then for each strictly increasing, convex function $ : K + — > R + with = 1 

d<i>o#{f,g) > d#(f,g) 

("Jensen's inequality"). 

Proof, (i) If ip(r) = cr then obviously d$(tf, 0) = t ■ d$(f, 0). See also standard literature [2]. 
(ii) Concavity of implies d(\f - g\) > - <&(g)\. 

(iv) Assume that e?$ i?(/, g) < t for some t £ (0, oo). It implies 

J^L(^(\f-g\)\^ df i<l. 

Classical Jensen inequality for integrals yields 

$(J <p(jtK\f-9\)) V) <1 
which - due to the fact that 4> _1 (1) = 1 — in turn implies d$(f,g) <t. □ 

3 The L^-Wasserstein Space 

Let (X, d) be a complete separable metric space and •& a ccc function with minimal factorization (ip, ip). 
The L -Wasser stein space V&{X) is defined as the space of all probability measures /ionI- equipped 
with its Borel cr-field - s.t. 

j^ip Cji/>(d(x,y))j dfi{x) < oo 

for some y £ X and some t £ (0,oo). The L^-Wasserstein distance of two probability measures fi, v £ 
V${X) is defined as 

Wo(jt,v) = infjt>0: inf / tp ( \ip(d(x, y)) ) dq(x, y) < 1 



qGll(n,u) JxxX \t 

where Tl(p., v) denotes the set of all couplings of \i and v, i.e. the set of all probability measures q on 
X x X s.t. q(A xX) = fj,(A), q(X x A) = v{A) for all Borel sets Ac X. 

Given two probability measures [i,v £ V$(X), a coupling q of them is called optimal iff 

ip ( —ip(d(x,y))) dq(x,y) < 1 
xxx \w J 

for w := W^(fi, v). 

Proposition 3.1. For each pair of probability measures fi, v £ V$(X) there exists an optimal coupling q. 

Proof. For t £ (0, oo) define the cost function c±(x,y) — (p(^tp(d(x ) y))). Note that t H> c t (x,y) is 
continuous and decreasing. 

Given /i, v s.t. w :— W&(fJ,,v) < oo. Then for all t > w the measures \x and v have finite Cj- 
transportation costs. More precisely, 



inf / c t (x,y)dq(x,y) < 1. 
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Hence, there exists q n € n(/x, v) s.t. 

c w+ x{x,y)dq n (x,y) < 1 + -. 

XxX " n 

In particular, J XxX c w +i(x,y)) dq n (x,y) < 2 for all n G N. Hence, the family (q n ) n is tight ([3], Lemma 
4.4). Therefore, there exists a converging subsequence (q n k)k with limit g G n(/i, z/) satisfying 

c w+ ±(x,y)dq(x,y) < 1 + - 

XxX " " 

for all n ([3], Lemma 4.3) and thus 

/ c w (x,y)dq(x,y) < 1. 

JXxX 

□ 

Proposition 3.2. is a complete metric on V$(X). 

The triangle inequality for W$ is valid not only on V$(X) but on the whole space V(X) of probability 
measures on X. The triangle inequality implies that W&((J,, v) < oo for all /z, v £ V${X). 

Proof. Given three probability measures /ii, ^2,^3 on X and numbers r,s with /J2) < r and 

W,j(/i2, ^3) < s. Then there exist a coupling q\2 of /xi and iX2 and a coupling 523 of [X2 and ZX3 s.t. 

<p (j-ip o d^j dg 12 < 1, J <p ^j-ip o d^j dq 2 3 < 1. 

Let gi23 be the gluing of the two couplings gi 2 and (723, see e.g. [I], Lemma 11.8.3. That is, (7123 is a 
probability measure on X x X x X s.t. the projection onto the first two factors coincides with qi2 and 
the projection onto the last two factors coincides with (723. Let qis denote the projection of (7123 onto the 
first and third factor. In particular, this will be a coupling of [i\ and /X3. Then for t :— r + s 



XxX 



p ( -v(d(x. z)) j : ) 



/ v[ ji/>(d(x,y)+d(y,z)) ) dqixiU- .'/■ ■-) 



XxXxX 



, 'rip(d{x,y)) sip(d(y,z))\ 
< j h- ] d£?i23(a:,J/,«) 



< ~ <P\ ) dqu 3 (x,y,z) + - / p dcjmfa.j/, z) 

1 JXxXxX \ r / 1 JXxXxX V s / 

r s 

< - • H 1 = 1. 

~ t f 

Hence, W$(fii, fia) < i. This proves the triangle inequality. 

To prove completeness, assume that (/J,k)k is a W^-Cauchy sequence, say W^(fi n , fXk) _ t n for all 
k > n with i„ — > as n — > 00. Then there exist couplings q n> k of /i„ and fj,k s.t. 

' 1_ 



V ( T^idix^y)) ) dq nt k(x,y) < 1. (3) 



Jensen's inequality implies 

d(x,y)dq n ,k(x,y) < t n ■ y> _1 (l) 



with d(x,y) := ip(d(x,y)). The latter is a complete metric on X with the same topology as d. That is, 
(/Jfe)fc is a Cauchy sequence w.r.t. the L -Wasserstein distance on V(X,d). Because of completeness of 
Vi(X, d), we thus obtian an accumulation point fi and a converging subsequence ([ikiji- According to [3J, 
Lemma 4.4, this also yields an accumulation point q n of the sequence (q n ,ki)i- Continuity of the involved 
cost functions - together with Fatou's lemma - allows to pass to the limit in ([3j to derive 

V (j~ip(d(z,y)) \ dq n {x 7 y) < 1 
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which proves that W$(/j,, /x„) < t n — > as n — > oo. 

With a similar argument, one verifies that ^) = if and only if [i = v. □ 

Remark 3.3. For each pair of probability measures [i,v on X 

W#{n,v)<l <=► inf / ${d{x,y))dq{x,y) < 1. 
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